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Abstract. We developed a version of AutoTutor that helps struggling adult learn- 
ers improve their comprehension strategies through conversational agents. We 
hypothesized that the accuracy and time to answer questions during the conver- 
sation could be diagnostic of their mastery of different reading comprehension 
components: words, textbase, situation model, and rhetorical structure. The re- 
sults show that adults’ performance on more basic reading components (i.e., 
meaning of words) was higher than on the deeper discourse levels. In contrast, 
time did not vary significantly among the theoretical levels. The results suggested 
that adults with low literacy had higher mastery on basic reading levels than 
deeper discourse levels. The tracking of performance on the four theoretical lev- 
els can provide a more nuanced diagnosis of reading problems than a single over- 
all performance score and ultimately improve the adaptivity of an ITS like Au- 
toTutor. 
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1 Introduction 


We developed a version of a web-based intelligent tutoring system (AutoTutor) for 
adults with low literacy skills to improve their reading comprehension strategies in the 
Center for the Study of Adult Literacy (CSAL). AutoTutor for CSAL has 35 lessons 
that focus on distinct theoretical levels of reading comprehension articulated by 
Graesser and McNamara [1]. For each lesson, the system starts out assigning words or 
texts at a medium difficulty level and then asks 8 to 12 multi-choice questions about 
them. In this study, we tracked four theoretical levels (of the six defined in [1]). Word 
represents the lower-level basic reading components. The other three theoretical levels 
(textbase, situation model, and rhetorical structure) represent deeper discourse levels. 
We hypothesized that the accuracy and time on questions in AutoTutor could be diag- 
nostic of adults’ mastery of comprehension components. Therefore, by comparing the 
accuracy and time on questions of four theoretical levels, we can detect adults’ strengths 
and weaknesses in reading competencies. 


This paper was presented at the International Conference on Intelligent Tutoring Systems, June 
11-15th, 2018. 
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2 Methods 


2.1‘ Participants 


The participants were 52 adults recruited from CSAL literacy classes in Atlanta and 
Toronto. They completed a 100-hour intervention over four months. Their ages ranged 
from 16—69 years (Mean = 40, SD = 14.97). The majority of the participants were fe- 
male (73.1%). All participants read at 3.0-7.9 grade levels. 


3 Measures and Data Analysis 


We extracted the adults’ initial responses on medium level questions in each of the 29 
lessons that focused on the four theoretical levels. All adults answered these initial me- 
dium questions before adaptively branching to easy or difficult questions in AutoTutor. 
The initial responses included accuracy (1 or 0) and time to select an answer (in sec- 
onds). 


We performed a descriptive analysis by exploring the means and standard deviations 
of accuracy and time on questions of the four theoretical levels. Then we performed 
mixed effect models [2] on the two measures to test the difference among the four the- 
oretical levels, with question as the unit of analysis. The random effects were partici- 
pants, lessons, and questions; the fixed effect was theoretical level. Participants’ ran- 
dom slopes on different theoretical levels and random intercepts of the interaction be- 
tween lesson and question were also included in the models. 


4 Results 


Table 1 shows the means of accuracy and time on questions separately as a function of 
the four theoretical levels. The pattern of scores indicate that performance is highest 
and answer times are shortest for the word level (reference level in the analysis) com- 
pared to the three discourse levels (textbase, situation model, and rhetorical structure). 


A Type II Wald Chi-square test on the logistic mixed effect model showed that accura- 
cies were significantly different (y’(3) = 8.34, p = 0.04) among the four theoretical lev- 
els. A post-hoc analysis with pairwise comparison showed only word pairs were sig- 
nificantly different. An ANOVA of type III with Satterthwaite on linear mixed effect 
model showed that time did not vary among the four theoretical levels, F(3,25.8) = 
0.058, p = 0.981. 


Table 1. Means and Standard Deviations of Accuracies and Time 


Situation | Rhetorical 
Word | Textbase Model Siuchits 
Nee 1455 | 1981 5049 5071 
Questions 
‘Accuracy | Mean 0.80 | 0.69 0.67 0.69 
¥ | (SD) (0.40) | (0.46) | (0.47) _| (0.46) 
Mean Sier- ast 35.2 37.1 
Time 
(SD) (30.4) | (30.2) | G1.6) | B8.1) 


5 Discussion and Conclusion 


The logistic mixed effect model indicates that adults’ performance on word level was 
higher than the three discourse levels. This likely occurred because word items focused 
on individual words or single sentences which require low loads on working memory, 
whereas solving the items of deeper discourse levels is time-consuming, strategic, and 
taxing on cognitive resources. The time that adults spent on questions were not signif- 
icantly different across theoretical levels, although times trended slower as theoretical 
levels progressed. 


This study provides a more nuanced diagnosis of adults’ reading problems within a 
multilevel reading comprehension framework than a single overall performance score 
could contribute. Future research should focus on designing standard reading tests and 
establishing norms for adult populations based on the multilevel framework that affords 
this diagnostically useful differentiation. Combining the testing results and the norm, 
researchers could develop more adaptive intelligent tutoring systems which provide 
customized learning contents to low literacy adults. 
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